HPC-CLUST: distributed hierarchical clustering for large sets of nucleotide sequences
نویسندگان
چکیده
منابع مشابه
HPC-CLUST: Distributed hierarchical clustering for very large sets of nucleotide sequences
Motivation: Nucleotide sequence data is being produced at an ever increasing rate. Clustering such sequences by similarity is often an essential first step in their analysis – intended to reduce redundancy, define gene families, or suggest taxonomic units. Exact clustering algorithms, such as hierarchical clustering, scale relatively poorly in terms of run time and memory usage, yet they are de...
متن کاملHPC-CLUST: distributed hierarchical clustering for large sets of nucleotide sequences
MOTIVATION Nucleotide sequence data are being produced at an ever increasing rate. Clustering such sequences by similarity is often an essential first step in their analysis-intended to reduce redundancy, define gene families or suggest taxonomic units. Exact clustering algorithms, such as hierarchical clustering, scale relatively poorly in terms of run time and memory usage, yet they are desir...
متن کاملApproximating Hierarchical MV-sets for Hierarchical Clustering
The goal of hierarchical clustering is to construct a cluster tree, which can be viewed as the modal structure of a density. For this purpose, we use a convex optimization program that can efficiently estimate a family of hierarchical dense sets in high-dimensional distributions. We further extend existing graph-based methods to approximate the cluster tree of a distribution. By avoiding direct...
متن کاملEfficient Hierarchical Clustering of Large Data Sets Using P-trees
Hierarchical clustering methods have attracted much attention by giving the user a maximum amount of flexibility. Rather than requiring parameter choices to be predetermined, the result represents all possible levels of granularity. In this paper a hierarchical method is introduced that is fundamentally related to partitioning methods, such as k-medoids and k-means as well as to a density based...
متن کاملSelf-Organizing Clustering: A Novel Non-Hierarchical Method for Clustering Large Amount of DNA Sequences
To cluster and characterize DNA sequences focusing on the oligonucleotide frequency, we developed a novel method and program package designed designated as Self-Organizing Clustering (SOC) [4]. Being based on Self-Organizing Map (SOM) [1, 2, 3], the algorithm of SOC made use of K-means with modification. In the SOC, the oligonucleotide frequency was regarded as a series of oligonucleotide patte...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Bioinformatics
سال: 2013
ISSN: 1460-2059,1367-4803
DOI: 10.1093/bioinformatics/btt657